Goto

Collaborating Authors

 Programming Languages


Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Neural Information Processing Systems

Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from rule-based outcome rewards. Recent RLVR works that operate under the zero setting avoid supervision in labeling the reasoning process, but still depend on manually curated collections of questions and answers for training. The scarcity of high-quality, human-produced examples raises concerns about the long-term scalability of relying on human supervision, a challenge already evident in the domain of language model pretraining. Furthermore, in a hypothetical future where AI surpasses human intelligence, tasks provided by humans may offer limited learning potential for a superintelligent system. To address these concerns, we propose a new RLVR paradigm called Absolute Zero, in which a single model learns to propose tasks that maximize its own learning progress and improves reasoning by solving them, without relying on any external human or distillation data. Under this paradigm, we introduce the Absolute Zero Reasoner (AZR), a system that self-evolves its training curriculum and reasoning ability. AZR uses a code executor to both validate self-proposed code reasoning tasks and verify answers, serving as an unified source of verifiable feedback to guide open-ended yet grounded learning. Despite being trained entirely without external data, AZR achieves overall SOTA performance on coding and mathematical reasoning tasks, outperforming existing zero-setting models that rely on tens of thousands of in-domain human-curated examples. Furthermore, we demonstrate that AZR can be effectively applied across different model scales and is compatible with various model classes.



Sheetpedia: A300K-Spreadsheet Corpus for Spreadsheet Intelligence and LLMFine-Tuning

Neural Information Processing Systems

Spreadsheets are widely used for data analysis and reporting, yet their complex structure and formula logic pose significant challenges for AI systems. We introduce Sheetpedia, a large-scale corpus of over 290,000 diverse spreadsheets (from 324,000+ workbooks) compiled from enterprise email archives and online forums. We detail a rigorous collection and preprocessing pipeline (integrating the Enron email spreadsheet archive and the Fuse web corpus, plus a new crawl of Excel forums) to standardize formats, filter languages, and remove duplicates. Sheetpedia provides extensive coverage of real formulas and annotations - addressing a gap left by prior table datasets (e.g.


Counterfactual Evolution of Multimodal Datasets via Visual Programming

Neural Information Processing Systems

The rapid development of Multimodal Large Language Models (MLLMs) poses increasing demands on the diversity and complexity of multimodal datasets. Yet manual annotation pipelines can no longer keep pace. Existing augmentation methods often follow fixed rules and lack verifiable control over sample diversity and reasoning complexity. To address this, we introduce Scalable COunterfactual Program Evolution (SCOPE), a framework that uses symbolic Visual Programming to guide program evolution via counterfactual reasoning. SCOPE performs the three steps of counterfactual inference: (1) Abduction, by generating verifiable programs to model reasoning associations; (2) Action, by intervening on program structure along three axes--reasoning path, visual context, and cross-instance composition; and (3) Prediction, by categorizing evolved instances by difficulty, structure, and input multiplicity. Based on this process, we build SCOPE-Train and SCOPE-Test, evolving benchmarks with expert validation. To support training, we propose MAP, a curriculum learning strategy that aligns model capacity with sample difficulty. Experiments show that SCOPEimproves reasoning performance, exposes model blind spots, and enhances visual dialog capabilities.


Visual Studio Pro 2026 is down to just 32.97 through 6/28 for Deal Days

PCWorld

When you purchase through links in our articles, we may earn a small commission. Visual Studio Pro 2026 is down to just $32.97 through 6/28 for Deal Days Microsoft Visual Studio Professional 2026 is on sale for $32.97 (reg. Developers love arguing about programming languages, frameworks, tabs versus spaces, and basically everything else. One thing they tend to agree on? Saving nearly $470 on a professional development environment is a pretty good idea.


Multi-SWE-bench: AMultilingual Benchmark for Issue Resolving

Neural Information Processing Systems

The task of issue resolving aims to modify a codebase to generate a patch that addresses a given issue. However, most existing benchmarks focus almost exclusively on Python, making them insufficient for evaluating Large Language Models (LLMs) across different programming languages. To bridge this gap, we introduce a multilingual issue-resolving benchmark, called Multi-SWE-bench, covering 8 widely used programming languages: Python, Java, TypeScript, JavaScript, Go, Rust, C, and C++. In particular, this benchmark includes a total of 2,132 highquality instances, carefully curated by 68 expert annotators, ensuring a reliable and accurate evaluation of LLMs on the issue-resolving task. Based on humanannotated results, the issues are further classified into three difficulty levels. We evaluate a series of state-of-the-art models on Multi-SWE-bench, utilizing both procedural and agent-based frameworks for issue resolving. Experimental results based on Multi-SWE-bench reveal three key findings: (1) Limited generalization across languages: While existing LLMs perform well on Python issues, their ability to generalize across other languages remains limited; (2) Performance aligned with human-annotated difficulty: LLM-based agents' performance closely aligns with human-assigned difficulty, with resolved rates notably decreasing as issue complexity rises; and (3) Performance drop on cross-file issues: The performance of current methods significantly deteriorates when handling cross-file issues. These findings highlight the limitations of current LLMs and underscore the need for more robust models capable of handling a broader range of programming languages and complex issue scenarios.


Adversarial generalization of unfolding (model-based) networks

Neural Information Processing Systems

Unfolding networks are interpretable networks emerging from iterative algorithms, incorporate prior knowledge of data structure, and are designed to solve inverse problems like compressed sensing, which deals with recovering data from noisy, missing observations. Compressed sensing finds applications in critical domains, from medical imaging to cryptography, where adversarial robustness is crucial to prevent catastrophic failures. However, a solid theoretical understanding of the performance of unfolding networks in the presence of adversarial attacks is still in its infancy. In this paper, we study the adversarial generalization of unfolding networks when perturbed with $l_2$-norm constrained attacks, generated by the fast gradient sign method. Particularly, we choose a family of state-of-the-art overaparameterized unfolding networks and deploy a new framework to estimate their adversarial Rademacher complexity. Given this estimate, we provide adversarial generalization error bounds for the networks under study, which are tight with respect to the attack level. To our knowledge, this is the first theoretical analysis on the adversarial generalization of unfolding networks. We further present a series of experiments on real-world data, with results corroborating our derived theory, consistently for all data. Finally, we observe that the family's overparameterization can be exploited to promote adversarial robustness, shedding light on how to efficiently robustify neural networks.


CUDA Proves Nvidia Is a Software Company

WIRED

There's a deep, forbidding moat that surrounds Nvidia--and it has nothing to do with hardware. Forgive me for starting with a cliché, a piece of finance jargon that has recently slipped into the tech lexicon, but I'm afraid I must talk about "moats." Popularized decades ago by Warren Buffett to refer to a company's competitive advantage, the word found its way into Silicon Valley pitch decks when a memo purportedly leaked from Google, titled "We Have No Moat, and Neither Does OpenAI," fretted that open-source AI would pillage Big Tech's castle. A few years on, the castle walls remain safe. Apart from a brief bout of panic when DeepSeek first appeared, open-source AI models have not vastly outperformed proprietary models.


Dangerous New Linux Exploit Gives Attackers Root Access to Countless Computers

WIRED

The exploit, dubbed CopyFail and tracked as CVE-2026-31431, allows hackers to take over PCs and data center servers. The Linux vulnerabilities have been patched--but many machines remain at risk. Publicly released exploit code for an effectively unpatched vulnerability that gives root access to virtually all releases of Linux is setting off alarm bells as defenders scramble to ward off severe compromises inside data centers and on personal devices. The vulnerability and exploit code that exploits it were released Wednesday evening by researchers from security firm Theori, five weeks after privately disclosing it to the Linux kernel security team. The critical flaw, tracked as CVE-2026-31431 and the name CopyFail, is a local privilege escalation, a vulnerability class that allows unprivileged users to elevate themselves to administrators.